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VERTEX DATA PROCESSING WITH MULTIPLE THREADS OF EXECUTION 

BACKGROUND 

The addressable and displayable basic element used to build up a computer image is 
a pixel. Each pixel has several essential parameters stored as the pixel ' s vertex data. Typical 
5 parameters are position data, such as an X coordinate, a Y coordinate and a Z coordinate, that 
o indicate the pixel's reference position in three dimensions (3D); color information, such as 
}n diffuse color parameters (R D , G D , B D , A) and specular color parameters (R s , G s , B s , F) which 
? ;1 form the pixel's diffuse color and specular color; texture information, such as the pixel's 
/i texture pattern and the depth of the pattern from the viewer; or any other suitable information 
'40 needed by the specific individual application. Based on the graphic standards used by an 
C5 application, parameters may be stored in different orders or formats within the vertex data. 
O For example, coordinate parameters may be stored as 32-bit floating-point format or fixed- 
O point format. The color information parameters may be stored as a simple group of 4 bytes 
or as a complicated group of 16 bytes. The graphic device displays the pixel based on its 
1 5 vertex data parameters. 

Typical image display systems by using hardware and software have automated 
several primitive draw functions. For example as shown in Figure la, to draw a line, the 
application needs to provide only the beginning pixel point A 10 (X„ Y b Z x ) and the ending 
pixel point B 12 (X 2 , Y 2 , Z 2 ) to the graphic device 9. The graphic device 9 determines which 
20 pixels are on the line between pixel A 10 and pixel B 12. Subsequently, the graphic device 
9 sets up these pixels' color information using the A and B pixels' color parameters. If the 
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application wants to move the line to a new location, the new positions of A 10 will be A' 
14 (X,+a, Y+b, Z,) and B 12 will be B' 16(X 2 +a, Y 2 +b, Zj). If a scaling factor c is involved, 
the new A' 14 pixel will be (x^c+a, y 2 *c+b, z 2 ) and B' 16 will be (X 2 *c+a, Y 2 *c+b, Z 2 ). 

The same principle applies to drawing a triangle, another primitive function. An 
application provides vertex data that has parameters of the three triangle end points. The 
graphic device 9 will set up the vertex data of all relevant pixels to draw the triangle. All two 
dimensional (2D) or 3D graphic objects are made up of a number of polygons which can be 
broken into primitive ftmctions, such as lines, triangles etc. To redraw 2D or 3D graphic 
objects requires redrawing the relevant primitives. The redrawing requires setting up all 
corresponding pixels' vertex data and redrawing them. All graphic operations, simple or 
complicated, are performed by manipulating the contents of pixel vertex data by 
multiplication, addition or logical operations, such as OR and exclusive OR. 

Users of personal computers or game systems utilize real-time effects on displayed 
images. In such systems, a 2D or 3D image is displayed at a rate of 30 or more frames per 
second. These rates allow the user to perceive continuous motion of objects in a scene. To 
achieve such a real-time, realistic and interactive image requires a tremendous amount of 
processing power. These effects require processing over a million graphic primitives per 
second. Typically, processing a million primitives requires multiplying and adding millions 
of floating-point and fixed-point values. 

Accordingly, it is desirable to improve the efficiency of transforming vertex data. 
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SUMMARY 

Multi-thread video data processing for use in a computer video display system. The 
parameters of vertex data are grouped into a plurality of groups. The computation needs of 
each group are broken down into several arithmetic operations to be performed by 
corresponding arithmetic units. The units concurrently process the vertex data. 

BRIEF DESCRIPTION OF THE DRAWINGS 
Figure la illustrates two displayed line images. 
Figure lb is the vertex data of the lines of Figure la. 
Figure 2 illustrates functional blocks of a setup engine. 
Figure 3a is a table of the basic state operations for the position data group. 
Figure 3b is a state diagram for the position data group. 
Figure 4a is a table of the basic state operations for the color information group. 
Figure 4b is the state diagram for the color information group. 
Figure 5 a is a table of the basic state operations for the texture information group. 
Figure 5b is the state diagram flow chart for the texture information group. 
Figure 6 illustrates the functional process flow for the transform engine. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 
Instead of using a traditional sequential processing approach, a multi-thread approach 
to process the vertex data may be used. As shown in Figures la and lb, computer monitor 
9 displays a first line with the beginning pixel point A 1 0 with parameters Xq, Y 0 , Z 0 , W 0 , S 0 , 
T 0 , C 0 and the end pixel point B 12 with parameters X l9 Y 1? Z l9 W 1? S l3 T x and C x stored as 
vertex data 20. That line may be modified. It may be moved to a new location, such as to 
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begin point 14 and end point 16. It may be scaled. It may have its specular color and texture 
pattern modified. One approach to redrawing the line is to process all parameters of vertex 
data 20 into new vertex data 40 before the new vertex data 40 is submitted for the line 
redraw. 

The transform process will be explained with reference to modifying a line's pixel 
vertex data parameters. This transform process may be used for any transformation. As 
shown in Figure 2, the transform engine 67 is a part of a setup engine 65. Vertex data is 
transformed by the transform engine 67 and processed by the other data processing engine 
68. Subsequently, the transformed and processed data is sent to raster engine 69 prior to 
output to the monitor 9. 

The transform engine 67 initially groups vertex data parameters together for 
processing. The groups allow for more efficient utilization of each arithmetic unit, such as 
a floating-point multiplication unit and a floating-point addition unit. One grouping scheme 
groups : the pixel position vertex data, the pixel color vertex data and the pixel texture vertex 
data together. To illustrate for a line, the pixels' position data Xq, Y 0 , Z 0 and W 0 and X l9 Y l5 
Zj and W, is selected as a first group. The pixels' color data C 0 and C x is selected as a second 
group and the pixels' texture data S 0 , T 0 and S 1? T x is selected as a third group. By analyzing 
the computational requirements of each group, the required tasks can be broken down into 
addition and multiplication operations. The broken down operations are used to construct 
multiplication and addition state operations. Any computation needs of the group can be 
fulfilled by using the combination of its basic state operations to achieve the final results. 
Using sequential states, the addition unit may perform operations such as subtraction, move, 
floating-point number conversion to fixed number, truncate, round to even, round to odd. 
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To transform the position data group as shown in Figure 3 a, one approach is to use 
ten basic state operations 80-89. Six 80-85 out of the ten basic 80-89 state operations involve 
multiplication. Three state operations 86-88 involve addition and one state operation 89 is 
a wait, no operation (NOP), state operation. There is also an idle state 79. As shown in 
5 Figure 3a, position state operation 0 80 involves multiplying the X coordinate by a scale 
factor. Position state operation 8 88 involves adding the Z coordinate with an offset. For 

vertex data of the initial line begin pixel A 10 (Xo,Y 0 , and Z 0 ) transforms to A x 14 (Xq 

=X 0 *c 1 +a 1 , Y / 0 =Y 0 *c 2 +a 2 , Z / 0 =Z 0 *c 3 +a 3 ). The transformation will require position state 

n operations (PSO) 0, 6, 1, 7, 2 and 8; 80, 86, 81, 87, 82 and 88 to complete the whole 
JO computation. Referring back to Figure 3b, the different paths from one position state 
n operation to other position data state operations are shown. 

To transform the color data group, one approach is to use ten independent color state 
i operations (CSO), as shown in Figure 4a. Each CSO involves only addition with one color 
f parameter. CSO0-3 100- 102 are related to diffuse color parameters addition, CSO 4-7 104- 

1 5 1 07are related to specular color parameters addition, and CSO 8-9 1 08- 1 09 move the and 
vertex data. The move operation may be performed using an addition unit. The different 
paths from one color state operation to other color state operations are shown in Figure 4b. 
To transform the texture data group, one approach is to use eight texture state operations 
. (TSOs). Six 122-127 of the TSOs are multiplication related and two 120, 121 of the TSOs 
20 are moves which can be performed by addition. Figure 5a shows the different paths from 
one TSO 120-127 to other TSOs 120-127. 
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By grouping the vertex data into position, color and textural groups, multiple 
arithmetic units, such as a floating-point multiplication and a floating-point addition unit, 
may be utilized more efficiently. To illustrate, if position group data is utilizing the floating- 
multiplication unit to perform a multiplication operation, simultaneously an addition 
operation of either the color group or texture group can utilize the addition unit. By 
continuously sending multiplication and addition operations to queues associated with the 
multiplication and addition units, both the multiplication and addition unit are used with 
higher efficiency accelerating data processing. 

Each of these groups of operations comprise a "program", or "thread of execution" 
that vies for the use of the shared arithmetic resources. Multiple controllers are typically 
used, each executing a thread, that can generate a sequence of instruction for the shared 
arithmetic resources. 

It is a common requirement that the vertex data processor be flexible enough, via 
programmability, to perform a certain subset of all of its possible operations, for any given 
graphics primitive or vertex. Since the exact operations to be performed by the transform 
engine are not known until run-time, it is desirable for the processor to respond dynamically 
to the processing workload to efficiently use the available processing resources. One 
technique for dynamic processing is to group the operations based on which function unit 
they use. Subsequently, the operations are concurrently scheduled to each function unit. 

To illustrate as shown in Figure 6, the vertex data 140 is broken into three groups; 
position group 145, color group 150 and texture group 155. The position group 145 requires 
PSO 0, 6, 1, 7, 2 and 8; 80, 86, 81, 87, 82 and 88 to complete its data transformation. The 
color group 150 requires CSO 0 and 8; 100 and 108 to complete its transformation. The 
textural group 150 requires TSO 0 and 2; 120 and 122 to transform the textural parameters. 
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All multiplication state operations from the position or textural groups 145, 155 will be 
queued at the multiplication queue 160 and all addition state operations from all three groups 
145, 150, 155 will be queued at the addition queue 165. The queued operations of both 
queues 160,165 will be independently executed by the multiplier unit 1 70 and the adder unit 
5 175. The queues are controlled by schedulers, such as an M-scheduler 1 8 1 and A-scheduler 
182. 

In certain circumstances, coordination between threads is needed. For example, 
intermediate results from the position thread (for example, perspective-related information) 
may be required by the texture thread. Binary or counting semaphore 180 can be used to 
AO synchronize the sequential execution of two different threads and to signal when the result 
i j from one thread is available for the next thread to consume. The results of the executed 

II operations are sent to a post-processing engine 185, such as the XEOPIPE, which performs 

I II operations, such as rounding or conversion from floating-point to fixed-point format. The 
* a buffer 195 holds the transformed vertex data until required by other processes. 

•rrs !$! % 
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CLAIMS 

What is claimed is: 

V. A method for processing video image data includes a plurality of different 
types of data such as position, texture and color data, the method comprising: 

providing tasks to be performed on each different type of data of the image 

data; 

dividing the image data into a plurality of groups based on the type of data for 
each group, determining a set of arithmetic operations required to accomplish the tasks 
provided for the corresponding type of data; 

assigning each arithmetic operation to one of a plurality of commonly used 
arithmetic units; 

performing each arithmetic operation by the assigned arithmetic unit whereby 
each type of data is transformed in accordance with the corresponding provided tasks; and 
combining the transformed data of each group. 

2. The method of claim 1 wherein the plurality of data groups comprises a 
position group for position vertex parameters, a color group for color vertex parameters and 
a texture group for texture vertex parameters. 

3 . The method of claims 1 wherein the plurality of said commonly used arithmetic 
units comprises an addition unit and a multiplication unit. 

4. The method of claim 1 wherein the determining a set of arithmetic operations 
for each task is based on in part by a sequence of arithmetic states. 
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5. The method of claim 1 further comprising providing a queue for each of the 
plurality of commonly used arithmetic units and wherein each assigned arithmetic operation 
is sent to the queue associated with its commonly used arithmetic unit. 

6. The method of claim 5 further comprising preventing the arithmetic units from 
performing the arithmetic operations of a task out of sequence. 



An apparatus for processing video image data including a plurality of different 
types of data such as position, texture and color data, the apparatus comprising: 

means for providing tasks to be performed on each different type of data of the 

image data; 

means for dividing the image data into a plurality of groups based on the type 
of data for each group, and for determining a set of arithmetic operations required to 
accomplish the tasks provided for the corresponding type of data; 

means for assigning each arithmetic operation to one of a plurality of 
commonly used arithmetic units; 

means for performing each arithmetic operation by the assigned arithmetic unit 
whereby each type of data is transformed in accordance with the corresponding provided 
tasks; and 




means for combining the transformed data of each group. 



8. The apparatus of claim 7 wherein the plurality of data groups comprises a 
position group for position vertex parameters, a color group for color vertex parameters and 
a texture group for texture vertex parameters. 
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9. The apparatus of claim 7 wherein the plurality of said commonly used 
arithmetic units comprises an addition unit and a multiplication unit. 

10. The apparatus of claim 7 wherein for each data group, the arithmetic operation 
set comprises a set of arithmetic states and the determined operations for each task are 
defined by a sequence of the set's arithmetic states. 

11. The apparatus of claim 7 further comprising a queue for each of said 
commonly used arithmetic units and wherein each arithmetic operation is sent to the queue 
associated with its commonly used arithmetic unit. 

12. The apparatus of claim 11 further comprising means for preventing the 
arithmetic units from performing the arithmetic operations of a task out of sequence. 



An apparatus for performing video processing, the video processing including 
performing tasks on vertex parameters, the apparatus comprising: 

a scheduler having an input configured to receive tasks and for arranging the 
vertex parameters to be processed into a plurality of groups based on in part characteristics 
of the vertex parameters; 

the sequencer for each group: 

determining the tasks required to process that group's parameters, 
determining a set of arithmetic operations required to accomplish that group's tasks, each 
arithmetic operation to be performed by one of a plurality of commonly used arithmetic 
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units, and sending each of the arithmetic operations of each of that group's tasks to the 
arithmetic unit associated with that arithmetic operation; and 

each of said commonly used arithmetic units, having an input configured to 
receive the sent arithmetic operations and vertex parameters associated with the sent 
operations and for performing the sent arithmetic operations on the sent vertex parameters. 

1 4 . The apparatus of claim 1 3 wherein the plurality of groups comprises a position 
group for position vertex parameters, a color group for color vertex parameters and a texture 
group for texture vertex parameters. 

15. The apparatus of claim 13 wherein the plurality of said commonly used 
arithmetic units comprises an addition unit and a multiplication unit. 

16. The apparatus of claim 13 wherein for each group, the arithmetic operation set 
comprises a set of arithmetic states and the determined operations for each task are defined 
by a sequence of the set's arithmetic states. 

17. The apparatus of claim 13 further comprising a queue for each of said 
commonly used arithmetic units and wherein the sent arithmetic operations are sent to the 
queue associated with its commonly used arithmetic unit. 

1 8 . The apparatus of claim 1 7 wherein the sequencer prevents the arithmetic units 
from performing the arithmetic operations of a task out of sequence. 
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ABSTRACT 

Multi-thread video data processing for use in a computer video display system. The 
parameters of vertex data are grouped into a plurality of groups. The computation needs of 
each group are broken down into several arithmetic operations to be performed by 
corresponding arithmetic units. The units concurrently process the vertex data. 
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